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Abstract 

The paper presents a language model that devel- 
ops syntactic structure and uses it to extract mean- 
ingful information from the word history, thus en- 
abling the use of long distance dependencies. The 
model assigns probability to every joint sequence 
of words-binary-parse-structure with headword an- 
notation and operates in a left-to-right manner - 
therefore usable for automatic speech recognition. 
The model, its probabilistic parameterization, and a 
set of experiments meant to evaluate its predictive 
power are presented; an improvement over standard 
trigram modeling is achieved. 

1 Introduction 

The main goal of the present work is to develop a lan- 
guage model that uses syntactic structure to model 
long-distance dependencies. During the summer96 
DoD Workshop a similar attempt was made by the 
dependency modeling group. The model we present 
is closely re lated to the one investigated in ( jCliclba 
et al., 1997), however different in a few important 



aspects: 

• our model operates in a left-to-right manner, al- 
lowing the decoding of word lattices, as opposed to 
the one referred to previously, where only whole sen- 
tences could be processed, thus reducing its applica- 
bility to n-best list re-scoring; the syntactic structure 
is developed as a model component; 

• our model is a factored version of the one 
in ( Chelba et al., 1997] ), thus enabling the calculation 



of the joint probability of words and parse structure; 
this was not possible in the previous case due to the 
huge computational complexity of the model. 

Our model develops syntactic structure incremen- 
tally while traversing the sentence from left to right. 
This is the main difference between our approach 
and other approaches to statistical natural language 
parsing. Our parsing strategy is similar to the in- 
cremental syntax ones propos ed relatively r ecently 
in the linguistic community ( Philips, 1996| ). The 
probabilistic model, its parameterization and a few 
experiments that are meant to evaluate its potential 
for speech recognition are presented. 




the_DT contract_NN ended VBD withJN a_DT loss NN otllN 7_CD cents_NNS after 



Figure 1: Partial parse 

2 The Basic Idea and Terminology 

Consider predicting the word after in the sentence: 
the contract ended with a loss of 7 cents 
after trading as low as 89 cents. 

A 3-gram approach would predict after from 
(7, cents) whereas it is intuitively clear that the 
strongest predictor would be ended which is outside 
the reach of even 7-grams. Our assumption is that 
what enables humans to make a good prediction of 
after is the syntactic structure in the past. The 
linguistically correct partial parse of the word his- 
tory when predicting after is shown in Figure ^. 
The word ended is called the headword of the con- 
stituent (ended (with (...))) and ended is an ex- 
posed headword when predicting after — topmost 
headword in the largest constituent that contains it. 
The syntactic structure in the past filters out irrel- 
evant words and points to the important ones, thus 
enabling the use of long distance information when 
predicting the next word. 

Our model will attempt to build the syntactic 
structure incrementally while traversing the sen- 
tence left-to-right. The model will assign a probabil- 
ity P(W, T) to every sentence W with every possible 
POStag assignment, binary branching parse, non- 
terminal label and headword annotation for every 
constituent of T. 

Let W be a sentence of length n words to which 
we have prepended <s> and appended </s> so 
that wq =<s> and w n +i =</s>. Let Wk be the 
word k-prefix wq . . . uik of the sentence and WkTk 




(<s>,SB) (w_p, t_p) (w_{p+l), t_{p+l}) (w_k, t_k) w_{k+l).... </s> 

Figure 2: A word-parse k-prefix 




(<s>, SB) (wj, t_l) («J, t_n) (</s>, SE) 

Figure 3: Complete parse 

the word-parse k-prefix. To stress this point, a 
word-parse k-prefix contains — for a given parse 
- only those binary subtrees whose span is com- 
pletely included in the word k-prefix, excluding 
wo =<s>. Single words along with their POStag 
can be regarded as root-only trees. Figure || shows 
a word-parse k-prefix; h_0 . . h_{-m} are the ex- 
posed heads, each head being a pair(headword, non- 
terminal label), or (word, POStag) in the case of a 
root-only tree. A complete parse — Figure |3| — is 
any binary parse of the 

(wi, ii) . . . (w n , t n ) (</s>, SE) sequence with the 
restriction that (</s>, TOP') is the only allowed 
head. Note that ((u»i, ii) . . . (w n ,t n )) needn't be a 
constituent, but for the parses where it is, there is 
no restriction on which of its words is the headword 
or what is the non-terminal label that accompanies 
the headword. 

The model will operate by means of three mod- 
ules: 

• WORD-PREDICTOR predicts the next word 
Wk+i given the word-parse k-prefix and then passes 
control to the TAGGER; 

• TAGGER predicts the POStag of the next word 
tfe+i given the word-parse k-prefix and the newly 
predicted word and then passes control to the 
PARSER; 

• PARSER grows the already existing binary 
branching structure by repeatedly generating the 
transitions: 

(unary, NTlabel), (adjoin-left, NTlabel) or 
(adjoin-right, NTlabel) until it passes control 
to the PREDICTOR by taking a null transition. 
NTlabel is the non-terminal label assigned to the 
newly built constituent and {left, right} specifies 
where the new headword is inherited from. 

The operations performed by the PARSER are 
illustrated in Figures ^||] and they ensure that all 
possible binary branching parses with all possible 



h_{-2} h_{-l} h_0 



T_{-m) 




Figure 4: Before an adjoin operation 

h'_{-l) =h_(-2) h'_0 = (h_(-l |. word, NTlabel) 



T'_{-m+l }<-<s> 




Figure 5: Result of adjoin-left under NTlabel 

headword and non-terminal label assignments for 
the Wi . . . u>k word sequence can be generated. The 
following algorithm formalizes the above description 
of the sequential generation of a sentence with a 
complete parse. 

Transition t; //a PARSER transition 

predict (<s>, SB); 

do{ 

//WORD-PREDICTOR and TAGGER 
predict (next_word, POStag); 
//PARSER 
do{ 

if (h_{-l}.word != <s>){ 
if(h_0.word == </s>) 

t = (adjoin-right, TOP'); 
else{ 

if (h_0.tag == NTlabel) 

t = [(adjoin— [left, right}, NTlabel), 
null] ; 

else 

t = [(unary, NTlabel), 

(adjoin— [left, right}, NTlabel), 
null] ; 

} 

} 

else{ 

if (h_0.tag == NTlabel) 

t = null ; 
else 

t = [(unary, NTlabel), null]; 

} 

}while(t != null) //done PARSER 
}while( ! (h_0.word==</s> && h_{-l} .word==<s>) ) 
t = (adjoin-right, TOP); //adjoin <s>_SB; DONE; 

The unary transition is allowed only when the 
most recent exposed head is a leaf of the tree - 
a regular word along with its POStag — hence it 
can be taken at most once at a given position in the 



h' _0 = (hj).word, NTlabel) 



T'_)-m+l(<-<s> 
<s> 




Figure 6: Result of adjoin-right under NTlabel 

input word string. The second subtree in Figure g 
provides an example of a unary transition followed 
by a null transition. 

It is easy to see that any given word sequence 
with a possible parse and headword annotation is 
generated by a unique sequence of model actions. 
This will prove very useful in initializing o ur m odel 
parameters from a treebank — see section 



3.5 



3 Probabilistic Model 

The probability P(W, T) of a word sequence W and 
a complete parse T can be broken into: 

P(W,T) = 

mill p(ww fc _ir fc _i) ■ p(tk/w k - 1 T k .. 1 ,w k ) ■ 

l[P(p k /W k - 1 T k ^,w k ,t k ,p k . . 

i=l 

where: 

• Wk-iTk-i is the word-parse (k — l)-prefix 

• w k is the word predicted by WORD-PREDICTOR 

• ifc is the tag assigned to w k by the TAGGER 

• N k — 1 is the number of operations the PARSER 
executes before passing control to the WORD- 
PREDICTOR (the iVfc-th operation at position k is 



the null liaiisiliun), N k is a function of T 



pf denotes the i-th PARSER operation carried out 
at position k in the word string; 
Pi E {(unary, NTlabel), 
(adjoin-left, NTlabel), 
(adjoin-right, NTlabel), null}, 
p\ £ { (adjoin-left, NTlabel), 
(adjoin-right, NTlabel)}, 1 < i < N k , 



=null, i 



Our model is based on three probabilities: 

Pitk/w^Wk-iTk-!) 



P(p%/w k ,t k ,W k - 1 T k ^ 



n k 

! Pi 



■pLi) 



(2) 
(3) 
(4) 



As can be seen, (w k ,t k , W k -iT k ^i,pl ...pf^) is one 
of the N k word-parse k-prefixes W k T k at position fc 
in the sentence, i = 1, Nk. 

To ensure a proper probabilistic _modcl ([!]) we 



have to make sure that (E 
fined conditional probabi 



), (g) and @ are well de- 
ities and that the model 



halts with probability one. Consequently, certain 
PARSER and WORD-PREDICTOR probabilities 
must be given specific values: 

• P(null/WfeT fe ) = 1, if h_{-l}.word = <s> and 
h_{0> ^ (</s> , TOP ' ) — that is, before predicting 
</s> — ensures that (<s>, SB) is adjoined in the 
last step of the parsing process; 

• P{ (adjoin-right, TOP) /W k T k ) = 1, if 
h_0 = (</s>, TOP') and h_{-l}. word = <s> 
and 

P{ (adjoin-right, TOP') /W k T k ) = 1, if 
h_0 = (</s>, TOP') and h_{-l}. word ^ <s> 

ensure that the parse generated by our model is con- 
sistent with the definition of a complete parse; 

• P( (unary, NTlabel) /W k T k ) = 0, if h_0.tag ^ 
POStag ensures correct treatment of unary produc- 
tions; 

• 3e > 0,\/Wk-iTk- 1 ,P{w k =</s>/Wk- 1 T k - 1 ) > e 
ensures that the model halts with probability one. 

The word-predictor model (Q) predicts the next 
word based on the preceding 2 exposed heads, thus 
making the following equivalence classification: 

P{w k /Wk-iT k -i) = P{w k /h Q ,h^) 

After experimenting with several equivalence clas- 
sifications of the word-parse prefix for the tagger 
model, the conditioning part of model ([}]) was re- 
duced to using the word to be tagged and the tags 
of the two most recent exposed heads: 

P(t k /w k , W k -iT k -i) = P(t k /w k , ho.tag, h-x.tag) 

Model (0) assigns probability to different parses of 
the word k-prefix by chaining the elementary oper- 
ations described above. The workings of the parser 



modu le are similar to those of Spatter ( Jelinek et al 



1994). The equivalence classification of the W k Tk 
word-parse we used for t he parser mode l (0) was the 



same as the one used in (Collins, 1996) 



P{p k jW k Tk) = P(p1/h 0l h- 1 ) 

It is worth noting that if the binary branching 
structure developed by the parser were always right- 
branching and we mapped the POStag and non- 
terminal label vocabularies to a single type then our 
model would be equivalent to a trigram language 
model. 

3.1 Modeling Tools 

All model components — WORD-PREDICTOR, 
TAGGER, PARSER — are conditional probabilis- 
tic models of the type P(y/xi, X2, ■ ■ ■ , x n ) where 
y, x%, X2, ■ ■ ■ , x n belong to a mixed bag of words, 
POStags, non-terminal labels and parser operations 
(y only). For simplicity, the modeling method we 
chose was deleted interpolation among relative fre- 
quency estimates of different orders /„(•) using a 



recursive mixing scheme: 

P(y/xi, ...,x n ) = 

X(xi, . . . ,x n ) ■ P(y/xi, . . . ,x n -i) + 

(1 - \{x X , . . .,X n )) ■ f n (y/x!, . . .,X n ), 

f-i(y) = uniform(vocabulary(y)) 



(5) 
(G) 



As can be seen, the context mixing scheme dis- 
cards items in the context in right-to-left order. The 
A coefficients are tied based on the range of the 
count C{x\, . . . , x n ). The approach is a standard 
one which doesn't require an exte nsive description 



given the literature available on it ( Jelinek and Mer- 
ger, 1980D 



3.2 Search Strategy 

Since the number of parses for a given word prefix 
W k grows exponentially with k, \{T k }\ ~ 0(2 k ), the 
state space of our model is huge even for relatively 
short sentences so we had to use a search strategy 
that prunes it. Our choice was a synchronous multi- 
stack search algorithm which is very similar to a 
beam search. 

Each stack contains hypotheses — partial parses 
— that have been constructed by the same number of 
predictor and the same number of parser operations. 
The hypotheses in each stack are ranked according 
to the ln(P(W, T)) score, highest on top. The width 
of the search is controlled by two parameters: 

• the maximum stack depth — the maximum num- 
ber of hypotheses the stack can contain at any given 
state; 

• log-probability threshold — the difference between 
the log-probability score of the top-most hypothesis 
and the bottom-most hypothesis at any given state 
of the stack cannot be larger than a given threshold. 

Figure shows schematically the operations asso- 
ciated with the scanning of a new word w k +\. The 
above pruning strategy proved to be insufficient so 
we chose to also discard all hypotheses whose score 
is more than the log-probability threshold below the 
score of the topmost hypothesis. This additional 
pruning step is performed after all hypotheses in 
stage k' have been extended with the null parser 
transition and thus prepared for scanning a new 
word. 

3.3 Word Level Perplexity 

The conditional perplexity calculated by assigning 
to a whole sentence the probability: 



P(W/T*)= Y[p(w k+1 /W k TZ), 



(7) 



fe=0 



where T* = argmaxTP{W,T), is not valid because 
it is not causal: when predicting w k+ \ we use T* 
which was determined by looking at the entire sen- 
tence. To be able to compare the perplexity of our 
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Figure 7: One search extension cycle 

model with that resulting from the standard tri- 
gram approach, we need to factor in the entropy of 
guessing the correct parse Tj* before predicting Wk+i, 
based solely on the word prefix W k . 

The probability assignment for the word at posi- 
tion k + 1 in the input sentence is made using: 

P(w k+1 /W k ) = 

Er k es k P(w k+ i/W k T k ) ■ p(W k ,T k ), (8) 

p(W k ,T k ) = P(W k T k )/ p (W k T k ) (9) 

T k £S k 

which ensures a proper probability over strings W* , 
where S k is the set of all parses present in our stacks 
at the current stage k. 

Another possibility for evaluating the word level 
perplexity of our model is to approximate the prob- 
ability of a whole sentence: 



N 



p(w) = j2 p ( w > T(k) ) 



(10) 



k=l 



where is one of the "N-best" — in the sense 
defined by our search — parses for W . This is a 
deficient probability assignment, however useful for 
justifying the model parameter re-estimation. 

The two estimates (|8|) and (|l0|) are both consistent 
in the sense that if the sums are carried over all 



possible parses we get the correct value for the word 
level perplexity of our model. 

3.4 Parameter Re-estimation 

The major problem we face when trying to reesti- 
mate the model parameters is the huge state space of 
the model and the fact that dynamic programming 
techniques similar to those used in HMM parame- 
ter re-estimation cannot be used with our model. 
Our solution is inspired by an HMM re-estimation 
tech nique that works o n pruned — N-best — trel- 
lises ( [Byrne et al., 1998[ ) . 

Let (W,TW),k = 1. . . N be the set of hypothe- 
ses that survived our pruning strategy until the end 
of the parsing process for sentence W. Each of 
them was produced by a sequence of model actions, 
chained together as described in section g; let us call 
the sequence of model actions that produced a given 
(W,T) the derivation(W,T). 

Let an elementary event in the derivation(W,T) 



be (y 



'-i 



) where: 



• I is the index of the current model action; 

• mi is the model component WORD- 
PREDICTOR, TAGGER, PARSER — that takes 
action number I in the derivation(W,T); 



•m 

tion 



(mi) 



is the action taken at position I in the deriva- 



if mi = WORD-PREDICTOR, then y\ mi > is a word 



if mi = TAGGER, then y\ 



(mi) 



xi"~" is the context in which the above action was 



if mi = PARSER 

(m, 

taken 
if mi 



then y\ 



(mi) 



is a POStag; 
is a parser-action; 



= WORD-PREDICTOR or PARSER, then 
(ho.tag, ho.word, h^i.tag, h—x.word); 



if mi = TAGGER, then 



„( m i 



(word-to-tag, h^.tag, h-\.tag). 



The probability associated with each mod el ac- 
tion is determined as described in section 3.1, based 
on counts C^ m '(y^ m \x} m ^), one set for each model 
component. 

Assuming that the deleted interpolation coeffi- 
cients and the count ranges used for tying them stay 
fixed, these counts are the only parameters to be 
re-estimated in an eventual re-estimation procedure; 
indeed, once a set of counts C^ m \y^ m \ x}™* 1 ) is spec- 
ified for a given model to, we can easily calculate: 

the relative frequency estimates 



(m) 



orders 



) for all context 
n = . . .maximum-order(TOOG?e^(TO.)); 

• the count C^ m \x^) used for determining the 

X(x_n ) value to be used with the order- n context 

J m ) 

Xn 

This is all we need for calculating the probability of 
an elementary event and then the probability of an 
entire derivation. 



One training iteration of the re-estimation proce- 
dure we propose is described by the following algo- 
rithm: 

N-best parse development data; // counts. Ei 
// prepare counts . E(i+1) 
for each model component c{ 

gather_counts development model_c; 

} 

In the parsing stage we retain for each "N-best" hy- 
pothesis (W, T^),k = 1 ... A, only the quantity 
<j,(W, T« ) = P(W, T( fc ))/ P(W, T<*>) 
and its derivation(W,T^). We then scan all 
the derivations in the "development set" and, for 
each occurrence of the elementary event {y^ m \ x^) 
in derivation(W,T^) we accumulate the value 
<I>(W,TW) in the C^ n \y {m \ x^) counter to be 
used in the next iteration. 

The intuition behind this procedure is that 
cp(W,TW) is an approximation to the P(T^/W) 
probability which places all its mass on the parses 
that survived the parsing process; the above proce- 
dure simply accumulates the expected values of the 
counts C^ m \y {m \x!: m ^) under the </>(W,TW) con- 
ditional distribution. As explained previously, the 
C (2/ ? x^ 7 ™^ ) counts are the parameters defining 
our model, making our procedure similar to a rigor- 
ous EM approach ( Dempster et al., 1977 ). 

A particular — and very interesting — case is that 
of events which had count zero but get a non-zero 
count in the next iteration, caused by the "N-best" 
nature of the re-estimation process. Consider a given 
sentence in our "development" set. The "N-best" 
derivations for this sentence are trajectories through 
the state space of our model. They will change 
from one iteration to the other due to the smooth- 
ing involved in the probability estimation and the 
change of the parameters — event counts — defin- 
ing our model, thus allowing new events to appear 
and discarding others through purging low probabil- 
ity events from the stacks. The higher the number 
of trajectories per sentence, the more dynamic this 
change is expected to be. 

The results we obtained are presented in the ex- 
periments section. All the perplexity evaluations 
were done using the left-to-right formula (||) (L2R- 
PPL) for which the perplexity on the "development 
set" is not guaranteed to decrease from one itera- 
tion to another. However, we believe that our re- 
estimation method should not increase the approxi- 
mation to perplexity based on @ (SUM-PPL) - 
again, on the "development set"; we rely on the con- 
sistency property outlined at the end of section 3.3 
to correlate the desired decrease in L2R-PPL with 
that in SUM-PPL. No claim can be made about 
the change in either L2R-PPL or SUM-PPL on test 
data. 





Figure 8: Binarization schemes 



3.5 Initial Parameters 

Each model component — WORD-PREDICTOR, 
TAGGER, PARSER — is trained initially from a 
set of parsed sentences, after each parse tree (W, T) 
undergoes: 

• headword percolation and binarization — see sec- 
tion [§ 

• decomposition into its derivation(W,T). 

Then, separately for each m model component, we: 

• gather joint counts C^ m \y^ m \ xj" 1 ^) from the 
derivations that make up the "development data" 
using <f>(W, T) = 1; 

• estimate the deleted interpolation coefficients on 
joint counts gathered from "check data" using the 
EM algorithm. 

These are the initial parameters used with the re- 
estimation procedure described in the previous sec- 
tion. 

4 Headword Percolation and 
Binarization 

In order to get initial statistics for our model com- 
ponent s we needed to bin arize the UPenn Tree- 
bank ( Marcus et al., 1995| ) parse trees and perco- 
late headwords. The procedure we used was to first 
percolate headwords using a context-free (CF) rule- 
based approach and then binarize the parses by us- 
ing a rule-based approach again. 

The headword of a phrase is the word that best 
represents the phrase, all the other words in the 
phrase being modifiers of the headword. Statisti- 
cally speaking, we were satisfied with the output 
of an enhanced v ersion of the procedure described 
in ( Collins, 1996| ) — also known under the name 
"Magerman & Black Headword Percolation Rules" . 

Once the position of the headword within a con- 
stituent — equivalent with a CF production of the 
type Z — y Y\ . . . Y n , where Z, Y\ , . . . Y n are non- 
terminal labels or POStags (only for Yi) — is iden- 
tified to be fc, we binarize the constituent as follows: 
depending on the Z identity, a fixed rule is used 
to decide which of the two binarization schemes in 
Figure || to apply. The intermediate nodes created 
by the above binarization schemes receive the non- 
terminal label Z'. 



5 Experiments 

Due to the low speed of the parser — 200 wds / min 
for stack depth 10 and log-probability threshold 
6.91 nats (1/1000) — we could carry out the re- 
estimation technique described in section 3.4 on only 
1 Mwds of training data. For convenience we chose 
to work on the UPenn Treebank corpus. The vocab- 
ulary sizes were: 

• word vocabulary: 10k, open — all words outside 
the vocabulary are mapped to the <unk> token; 

• POS tag vocabulary: 40, closed; 

• non-terminal tag vocabulary: 52, closed; 

• parser operation vocabulary: 107, closed; 

The training data was split into "development" set 

- 929,564wds (sections 00-20) — and "check set" 

- 73,760wds (sections 21-22); the test set size was 
82,430wds (sections 23-24). The "check" set has 
been used for estimating the interpolation weights 
and tuning the search parameters; the "develop- 
ment" set has been used for gathering/estimating 
counts; the test set has been used strictly for evalu- 
ating model performance. 

Table [l] shows the result s of the re-estimation tech- 
nique presented in section 3.4. We achieved a reduc- 
tion in test-data perplexity bringing an improvement 
over a deleted interpolation trigram model whose 
perplexity was 167.14 on the same training-test data; 
the reduction is statistically significant according to 
a sign test. 



iteration 
number 


DEV set 
L2R-PPL 


TEST set 
L2R-PPL 


E0 


24.70 


167.47 


El 


22.34 


160.76 


E2 


21.69 


158.97 


E3 


21.26 


158.28 


3- gram 


21.20 


167.14 



Table 1: Parameter re-estimation results 

Simple linear interpolation between our model and 
the trigram model: 

Q(w k+ i/W k ) = 

A ■ P(wk+i/w k -x,Wk) + (1 - A) • P(w k+1 /W k ) 

yielded a further improvement in PPL, as shown in 
Table |^. The interpolation weight was estimated on 
check data to be A = 0.36. 

An overall relative reduction of 11% over the trigram 
model has been achieved. 

6 Conclusions and Future Directions 

The large difference between the perplexity of our 
model calculated on the "development" set — used 



iteration 
number 


TEST set 
L2R-PPL 


TEST set 
3-gram interpolated PPL 


EO 


167.47 


152.25 


E3 


158.28 


148.90 


3-gram 


167.14 


167.14 



Table 2: Interpolation with trigram results 



for model parameter estimation — and "test" set — 
unseen data — shows that the initial point we choose 
for the parameter values has already captured a lot 
of information from the training data. The same 
problem is encountered in standard n-gram language 
modeling; however, our approach has more flexibility 
in dealing with it due to the possibility of reestimat- 
ing the model parameters. 

We believe that the above experiments show the 
potential of our approach for improved language 
models. Our future plans include: 

• experiment with other parameterizations than the 
two most recent exposed heads in the word predictor 
model and parser; 

• estimate a separate word predictor for left-to- 
right language modeling. Note that the correspond- 
ing model predictor was obtained via re-estimation 
aimed at increasing the probability of the "N-best" 
parses of the entire sentence; 

• reduce vocabulary of parser operations; extreme 
case: no non-terminal labels/POS tags, word only 
model; this will increase the speed of the parser 
thus rendering it usable on larger amounts of train- 
ing data and allowing the use of deeper stacks - 
resulting in more "N-best" derivations per sentence 
during re- estimation; 

• relax — flatten — the initial statistics in the re- 
estimation of model parameters; this would allow the 
model parameters to converge to a different point 
that might yield a lower word-level perplexity; 

• evaluate model performance on n-best sentences 
output by an automatic speech recognizer. 
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